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Abstract 

Single-nucleotide polymorphisms (SNPs, >2000) were discovered by using RNA-seq and allele-specific se- 
quencing approaches in pigeonpea (Cajanuscajan). For making the SNP genotyping cost-effective, successful 
competitive allele-specific polymerase chain reaction (KASPar) assays were developed for 1616 SNPs and 
referred to as PKAMs ( pigeonpea KASPar assay markers). Screening of PKAMs on 24 genotypes [23 from cul- 
tivated species and 1 wild species (Cajanus scarabaeoides)] defined a set of 1 1 54 polymorphic markers 
(77.4%) with a polymorphism information content (PIC) value from 0.04 to 0.38. One thousand and 
ninety-four PKAMs showed polymorphisms between parental lines of the reference mapping population 
(C. cajan ICP 28 x C. scarabaeoides ICPW 94). By using high-quality marker genotyping data on 167 F 2 
lines from the population, a comprehensive genetic map comprising 875 PKAMs with an average inter- 
marker distance of 1.1 1 cM was developed. Previously mapped 3 5 simple sequence repeat markers were 
integrated into the PKAM map and an integrated genetic map of 996.21 cM was constructed. Mapped 
PKAMs showed a higher degree of synteny with the genome of Glycine max followed by Medicago truncatula 
and Lotus japonicus and least with Vigna unguiculata. These PKAMs will be useful for genetics research and 
breeding applications in pigeonpea and for utilizing genome information from other legume species. 
Key words: pigeonpea; SNP; linkage map; comparative genomics; molecular breeding 



1. Introduction productivity. One of the foremost applications of gen- 

omics in breeding is the prediction of a phenotype 
Recent advances in genomics have provided various from the genotype and the process is called genom- 
opportunities to a number of crop species of signifi- ics-assisted breeding (GAB). 1 Several success stories 
cant agronomical importance for enhancing crop of GAB are available in many temperate cereal 
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crops 2 and some legume species 3 also. However, 
legume crops like pigeonpea (Cajanus cajan 
L. Millspaugh), which is grown in ~5 million hectares 
in the developing countries of Asia and Africa have 
remained untouched by GAB. This may be attributed 
mainly to two reasons: (i) paucity of genomic tools 
and (ii) narrow genetic diversity in the gene pool of 
the pigeonpea. Following 5 years of intensive research 
efforts and investment in genomics, however, a signifi- 
cant amount of genomic resources such as a large col- 
lection of expressed sequence tags (ESTs) or transcript 
reads, 4-7 large-scale molecular markers including 
simple sequence repeats (SSRs), 8,9 single-feature poly- 
morphisms, 10 single-nucleotide polymorphisms 
(SNPs) 5 " 7 and diversity array technology (DArT) 11,12 
markers have been developed. Very recently, the 
draft genome sequence has also become available. 13 
Identification of molecular markers for applying in a 
breeding programme requires the development of a 
genetic map and quantitative trait locus (QTL) ana- 
lysis. In the case of pigeonpea, although a few 
genetic maps have been developed, 9,1 2,14,1 5 the 
marker density in those maps is very low. A major 
challenge, therefore, before the pigeonpea commu- 
nity is the development of a saturated genetic map. 
In this context, SNP markers have attracted significant 
attention as these markers represent the most abun- 
dant class of polymorphisms in genomes and are 
amenable for high-throughput genotyping. 3,1 6 In 
general, the implementation of SNPs for genetic 
studies involves a three-step process: (i) SNP discovery 
after aligning the ESTs, sequence reads generated by 
Sanger or next generation sequencing (NGS) tech- 
nologies for different genotypes in a given species; 
(ii) validation of SNPs to distinguish DNA polymorph- 
isms of actual allelic variants from those of other bio- 
logical phenomenon, such as gene duplication events; 
i.e. paralogous or homeologous genes, as well as those 
of technical errors, primarily sequencing errors, in 
case SNPs have been identified using in silico 
approaches; (iii) SNP genotyping of germplasm collec- 
tion or genetic/breeding populations. A wide range of 
molecular techniques suitable for pursuing the men- 
tioned three steps have been available, 17 each charac- 
terized by a distinct cost scale and throughput 
capacity, and utilizing different technology platforms. 
NGS technologies are bringing us the capacity to iden- 
tify, at affordable cost, large numbers of SNPs for even 
non-model species. Similarly, the availability of a 
number of SNP genotyping platforms, in a high- 
throughput manner, is making SNP genotyping cost- 
effective. 

Depending on the sample size and number of SNPs 
to be analysed, medium- to high-throughput assay 
platforms such as BeadXpress and GoldenGate assays 
from lllumina Inc. with a varying set of multiplexes 



(96, 384, 768 or 1 536 SNPs per assay) are available. 
Such platforms have been developed and used in 
several crop species such as barley, 18 wheat, 19 
maize 20 and oil seed rape, 21 and legumes such as 
soybean, 22 cowpea 23 and pea. 24 Furthermore, in 
some crops like maize, Infinium assays with the cap- 
acity of genotyping ~50 000 SNPs have become avail- 
able. 25 Such platforms, however, are cost-effective 
only when a minimum of 96, 384, 762, 1 536 or 
thousands of SNPs are used with a large number of 
samples. In cases like marker-assisted selection 
where only a few markers are required for genotyping 
large-scale segregating populations, lllumina-based 
genotyping assays do not seem to be cost-effective. 
In such cases, the competitive allele-specific polymer- 
ase chain reaction (PCR) (KASPar) assay from 
KBiosciences (www.kbioscience.co.uk) seems to be 
an effective marker assay. The KASPar genotyping 
assay is a competitive allele-specific PCR-based fluor- 
escent SNP genotyping system. It is a user-friendly 
system that provides flexibility in the numbers of 
SNPs and genotypes to be used for assays. Details 
about this technology are available at http://www. 
kbioscience.co.uk/reagents/KASP.html. Because of 
the importance of KASPar assays in genotyping more 
samples with a few SNPs, they have been developed 
in wheat, 26 common bean 27 and chickpea. 28 

With an objective to develop a flexible and cost- 
effective SNP genotyping platform in pigeonpea, 
this study reports the compilation of informative SNP 
data sets, development and characterization of KASPar 
assays, and development of an SNP-based genetic 
linkage map of pigeonpea and its use for comparative 
genomics with closely related legume species like 
soybean (Glycine max), cowpea (Vigna ungiculata), 
Medicago truncatula and Lotus japonicus. 

2. Materials and methods 

2. 7. Mapping population and DNA isolation 

Two Cajanus spp., one from cultivated pigeonpea 
(C. cajan) ICP 2 8 and the another accession from 
the wild relative of pigeonpea (C. scarabaeoides) 
ICPW 94, were used as crossing parents for the devel- 
opment of an F 2 population of 167 individuals. 
Accordingly, a single plant of ICP 2 8 accession was 
used as a female parent and crossed with the pollen 
parent ICPW 94 plant and FtS were produced. All 
FtS were selfed under nylon bags and grown at 
Patancheru in southern India (1 7°N). A single Ft 
plant having the highest number of F 2 seeds was 
selected to develop a mapping population of 1 67 F 2 
individuals. To characterize developed SNP markers, 
a set of 24 genotypes was utilized for screening the 
polymorphism (Supplementary Table S1). These 
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genotypes represent parents of 14 mapping popula- 
tions which are segregating for various agronomically 
important traits. Total DNAfrom the parents of differ- 
ent mapping populations and F 2 lines derived from 
ICP 28 x ICPW 94 were isolated from two to three 
young leaves following the standard DNA isolation 
protocol as mentioned in the study of Cue et at. 29 
The DNA quantity for each sample was assessed on 

0. 8% agarose gel. 

2.2. Identification ofSNPs 

The complete lllumina GA llx data set was com- 
prised 128.9 million, 36 bp short single-end reads 
from 1 2 genotypes (ICPL 871 1 9, ICPL 87091, BSMR 
736, TAT 10, ICP 7035, TTB 7, ICPL 332, ICPL 
20096, ICPB 2049, ICPL 99050, ICP 28 and ICPW 
94; Table 1). Identification of SNPs from assembled 
data was carried out using the Alpheus software 
system. 30 SNPs were identified on the basis of align- 
ment of sequence reads generated from each of the 
counter genotypes against the reference assembly, 

1. e. pigeonpea transcriptome assembly which was 
developed by assembling four different sequence 
data sets 7 and allowing not more than two mis- 
matches. Based on the alignment results, variants at 
a particular nucleotide position were identified. 
Significant variants were selected based on two cri- 
teria: (i) the allele frequency between two genotypes 
>0.8 (the number of a specific allele divided by the 
number of all alleles for the specific SNP between 
two genotypes should be higher than 80%) and (ii) 
the number of tags aligned to the reference >5. 

2.3. KASPar genotyping assay 

For each putative SNP, the criteria used for the selec- 
tion of high-quality SNPs for KBioscience competitive 
allele-specific PCR (KASPar) assay 31 included: (i) an 
SNP flanked by at least 50 bases on either side; (ii) fre- 
quency difference between the two genotypes >5; 
(iii) read depth >5. For each SNP, two allele-specific 
forward primers and one common reverse primer 
were designed. By using these primers, KASPar assays 
were performed in a final reaction volume of 5 |j± 
containing 1 x KASP reaction mix (KBioscience, 
Hoddesdon, UK), 0.07 |jlL of assay mix (1 2 |xM each 
allele-specific forward primer and 30 |jlM reverse 
primer) and 10-20 ng of genomic DNA. The Gene 
Pro Thermal cycler (Bioer Technology, Hangzhou, 
China) was used for the following cycling conditions: 
1 5 min at 94°C; 10 touchdown cycles of 20 s at 
94°C and 60 s at 65-57°C (the annealing tempera- 
ture for each cycle being reduced by 0.8°C per 
cycle); and 26-35 cycles of 20s at 94°C and 60s 
at 57°C. Fluorescence detection of the reactions was 
performed using an Omega Fluorostar scanner (BMG 



Table 1. Summary on the identification of SNPs in Cajanus spp. 
accessions 



Genotypes lllumina GA Total Number of unique 

llx reads number genes with 

of SNPs identified SNPs 



ICPI 87119 


7 1 82 619 


7488 


3116 


ICPI 87091 


8 977 567 






RSMR 736 


1 1 065 21 9 

I 1 U U j i 1 7 


2115 


1 454 


TAT 1 0 


7 932 691 






ICP 703 5 


1 3223516 

1 J i. i. J J 1 U 


1 256 


983 


TTB 7 


4 1 22 21 6 






ICPI 332 


16 361 1 1 5 

1 U J U 1 1 1 ._} 


2452 


1819 


ICPL 20096 


9 507 797 






ICPB 2049 


1 1 494 670 


1 892 


1435 


ICPL 99050 


1 3 498 1 56 






ICP 2 8 


9 721 562 


1910 


1 352 


ICPW 94 


1 5 828 791 






Total 


1 28 91 5 91 9 


17 113 


1 0 1 59 


TOGs 








ICP 28 ^1 
ICPW 94 a J 




752 


670 



a ICPW 94 is an accession of C. scarabaeoides, a wild relative 
of pigeonpea (C. cajan). 



LABTECH GmbH, Offenburg, Germany) and the data 
were analysed using the KlusterCaller 1.1 software 
(KBioscience). Details on the KASPar principle, ampli- 
fication of targeted region, fluorescence detection and 
allele calling are available at http://www.kbioscience. 
co.uk/reagents/KASP_manual.pdf. The polymorphism 
information content (PIC) values for developed markers 
across 24 genotypes (Supplementary Table S1) were 
calculated by using the PowerMarker software (http: 
//statge n . ncs u .ed u / powe rm a rke r/) . 

2.4. Linkage mapping 

Genotyping data generated using KASPar assays on 
1 67 F 2 individuals of an ICP 28 x ICPW 94 population 
were analysed for linkage analysis using JoinMap 
version 4.0 with 'regression mapping algorithm'. 32 
Prior to linkage analysis, marker segregation data 
were subjected to the goodness-of-fit test (x 2 ) to 
assess deviations from the expected Mendel ian segre- 
gation ratio of 1 :2:1 at a 5% level of significance. Map 
calculations were performed at a logarithm (base 1 0) 
of odds (LOD) value of 5.0, recombination frequency 
<0.40 and x 2 jump threshold for removal of loci = 
5. A Kosambi map function was used to convert the 
recombination frequency into cM distances 33 and 
the third round was set to allow the mapping of an 
optimum number of loci in the genetic map. Mean 
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X 1 contributions or average contributions to the 
goodness of fit of each locus were also checked to de- 
termine the best fitting position for markers in 
genetic maps. The markers showing negative map dis- 
tances or a large jump in mean x 2 values were subse- 
quently discarded. The graphical maps of the linkage 
groups (LGs) were constructed by using MapChart 
version 2. 2. 34 

2.5. Comparative genome analysis 

DNA sequences for corresponding mapped SNP 
markers were used for comparative analysis with 
the genetic map of cowpea 23 and the genome assem- 
blies of soybean (ftp://ftp.jgi-psf.org/pub/JGI_data/ 
phytozome/v7.0/Gmax/assembly), M. truncatula 
(http://www.medicagohapmap.org/downloads.php) 
and L. japonicus (ftp://ftp.kazusa.or.jp/pub/lotus/ 
lotus_r2.5/pseudomolecule). Vmatch 35 was used to 
identify reciprocal best matches between the pigeon- 
pea sequences and other legume sequences. Hits 
matching a minimum of 70% sequence identity were 
retained for the comparative study. Identification of 
homologous blocks was performed using i-ADHoRe 
v2.1. 36 For the purpose of developing Circos images, 
cM distances on the pigeonpea LGs were scaled up by 
a factor of 250 000 to match similar bp lengths of 
the chromosomes of other legumes' genomes. 
Synteny blocks were visualized by using Circos26. 37 
Scales along the outer edges of the pigeonpea and 
cowpea LGs show actual cM distances, whereas those 
along the outer edges of the soybean, Medicago and 
Lotus chromosomes are in Mb. 

3. Results 

3. 1 . Development of a mapping population 
Although a set of 72 F 2 plants were available from 

an earlier cross (C. cajan ICP 28 x C. scarabaeoides 
ICPW 94) that were used to develop an SSR-based 
genetic map, 9 a new cross with the same accessions 
was made to develop a bigger population (1 67 F 2 
lines) for developing a high-resolution genetic map. 

3.2. Assembly of informative SNPs 

With a goal of increasing the cost-effective and 
high-throughput genetic marker repertoire in pigeon- 
pea, the following two different sequence resources 
were surveyed for the presence of SNPs: (i) lllumina 
GA llx transcript sequence data and (ii) tentative 
orthologous genes (TOGs) of closely related legumes. 

3.2.1 . SNPs from lllumina GA llx transcript sequence 
data For the identification of SNPs, 1 2 8.9 
million lllumina reads of 1 2 different genotypes (ICPL 
871 1 9, ICPL 87091, BSMR 736, TAT 1 0, ICP 7035, 



TTB 7, ICPL 332, ICPL 20096, ICPB 2049, ICPL 
99050, ICP 28 and ICPW 94) were aligned against 
the transcriptome assembly (CcTA v2). 7 The CcTA v2 
comprised 21 434 transcriptome assembly contigs 
(TACs) developed from transcriptome data sets from 
21 pigeonpea genotypes (128.9 million lllumina GA 
llx reads from 12 genotypes, 2.19 million FLX/454 
reads from 3 genotypes and 1 8 353 Sanger ESTs 
from 6 genotypes). 7 Variants were identified using 
the Alpheus' program 30 by comparing the sequence 
tags from two genotypes of a given mapping popula- 
tion combination. In total, a set of 1 7 1 1 3 SNPs in 
10 159 unique sequences were identified between 
six crosses (Table 1 ). The number of SNPs in an indi- 
vidual cross ranged from 1 256 (TTB 7 x ICPL 7035) 
to 7488 (ICPL 87119 x ICPL 87091) (Table 1 ). 
However, only six SNPs were found common across 
three populations (ICPL 20096 x ICPL 332, TTB 7 x 
ICPL 7035 and BSMR 736 x TAT 1 0). For the ICP 28 
and ICPW 94 combination, a total of 1910 SNPs 
were identified in 1 352 TACs. By considering only 
one SNP per TAC (gene) and primer designing criteria 
of KASPar assays, 1 1 67 SNPs were further selected. 

3.2.2. SNPs from TOGs After sequencing ICP 28 
and ICPW 94 accessions with 670 TOGs, a set of 
752 SNPs were identified and used in designing the 
GoldenGate assay in a separate study. 38 From these 
752 SNPs, a total of 660 SNPs satisfied the required 
primer designing criteria for KASPar assays. 

After combining the above-mentioned data sets, a 
non-redundant set of 1 827 SNPs (1167 from 
lllumina GA llx transcript data, referred as GAIIx- 
SNPs and 660 TOG-SNPs) was complied. 

3.3. Development and validation of KASPar assays 

A total of 1 82 7 non-redundant SNPs were used for 
the development of KASPar assays (Supplementary 
Table S2). However, successful assays could be devel- 
oped for 1616 SNPs (88.4%) with scorable allele 
calls. These marker assays have been referred to as 
PKAMs (pigeonpea KASPar assay markers). All 1 61 6 
PKAMs were screened on 24 pigeonpea genotypes, in- 
cluding 2 3 cultivated and one wild-type ICPW 94, 
representing parents of 14 mapping populations 
(Supplementary Table S1), further defined a subset 
of 1154 polymorphic markers (7 7.4%). Among 
these polymorphic PKAMs, 1 043 were polymorphic 
exclusively in wild species. Data obtained from 24 
genotypes were used to calculate the PIC value of 
each PKAM marker, and PIC values ranged from 
0.04 to 0.38 with an average of 0.09 
(Supplementary Table S2). In terms of the parental 
polymorphism of different mapping populations, 
polymorphism rates varied considerably, depending 
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on the crossing parental lines under comparison, from 
a low of 14 polymorphic PKAMs (ICP 8863 x ICPL 
20097) to a high of 1094 polymorphic PKAMs (ICP 
28 x ICPW 94) (Supplementary Table S1). 

3.4. SNP-based genetic map 

With a goal of developing an SNP-based genetic 
map in pigeonpea, genotyping data were obtained 
on 167 progenies for all the 1094 polymorphic 
PKAMs. However, high-quality data obtained for 
1008 PKAMs were considered for further analysis. 
Genotyping data obtained for all 1 008 PKAMs were 
tested for the Mendelian/non-Mendelian segregation 
pattern. As a result, the 33 PKAMs showing non- 
Mendelian inheritance were removed from further 
analysis. Subsequently, genotyping data for 975 
PKAMs (470 PKAMs based on GAIIx-SNPs and 505 
PKAMs from TOG-SNPs) were used for linkage ana- 
lysis, with JoinMap4.0 used to construct the genetic 
linkage map. 32 

In summary, 1 1 LGs were generated using an LOD 
threshold value of 5.0, which is in agreement with 
the haploid chromosome number (11) in pigeonpea. 
A total of 55 PKAMs that failed to be assigned to 
these 1 1 LGs were not incorporated in further analyses. 
While assigning the order of PKAMs in different individ- 
ual LGs, map positions could not be assigned for 
45 PKAMs. As a result, the developed genetic map 
contains 875 (444 GAIIx-SNP and 431 TOG-SNP) 
loci on 1 1 LGs (Supplementary Fig. S1), ranging from 
2 5.7 cM (CcLG05) to 1 24.2 5 cM (CcLG1 1) in length, 
and a total map length of 967.03 cM (Supplementary 
Table S3). The number of markers per LG ranges 
from 25 (CcLG05) to 1 34 (CcLG02), with an average 
of 79.54 (Table 2 and Supplementary Fig. S1). The 
highest marker density with an inter-marker distance 
of 0.84 cM was observed on CcLG02, while CcLG09 



had the lowest marker density with average marker 
spacing 1.79 cM. Most of the spaces between two 
markers were smaller than 20 cM on the genetic 
linkage map. However, there were only two spaces 
where the distances between markers were large, i.e. 
23.56cM between PKAM0211 and PKAM0417 on 
CcLG02 and 28.85 cM between PKAM0671 and 
PKAM0543 on CcLG06. 



3.5. Integrated genetic map 

With an objective to provide anchor points to inte- 
grate an SSR-based genetic linkage map with the SNP- 
based genetic linkage map, the newly developed 
mapping population (ICP 28 x ICPW 94) was also 
genotyped with targeted SSRs (2-5) from each LG 
of the SSR-based genetic linkage map previously 
developed by Bohra et al. 9 In this context, genotyping 
data generated for a total of 35 SSRs. All the 3 5 SSRs 
were mapped onto 11 LGs (CcLG01 -CcLG1 1 ). The 
number of SSR markers per LG varied from two 
(CcLG02, CcLG08, CcLG10 and CcLG11) to five 
(CcLG03 and CcLG04) (Table 2). After integration of 
the SSR markers, the total distance of the integrated 
genetic linkage map increased by 29.1 8 cM (Fig. 1). 
Subsequently, the integrated genetic map was com- 
pared with the SSR-based genetic linkage map and 
the marker order was consistent between similar 
LGs. However, in the case of CcLG03, out of five 
common markers, two markers, namely CcM1593 
and CcM2 045, had different positions. All pairwise 
comparisons between the integrated LGs and SSR- 
based LGs have been shown in Supplementary Fig. S2. 
As expected, the marker order and distances were 
well conserved between the integrated genetic map 
and the SNP-based genetic map. 



Table 2. Features of SNP-based and integrated pigeonpea genetic maps 



Pigeonpea LG 


SNP-based map 




Integrated map (SNPs + SSRs) 




Size (cM) 


Number of loci 


Average inter- loci distance 


Size (cM) 


Number of loci 


Average inter-loci distance 


CcLG01 


95.83 


79 


1.21 


107.29 


82 


1.31 


CcLG02 


1 12.55 


1 34 


0.84 


1 1 2.55 


1 36 


0.83 


CcLG03 


1 07.85 


1 1 8 


0.91 


1 20.88 


1 23 


0.98 


CcLG04 


70.29 


65 


1.08 


70.77 


70 


1.01 


CcLG05 


25.7 


25 


1.03 


28.41 


28 


1.01 


CcLG06 


81 .28 


82 


0.99 


80.93 


85 


0.95 


CcLG07 


93.22 


84 


1.1 1 


94.1 4 


88 


1.07 


CcLG08 


96.48 


71 


1.36 


97.43 


73 


1.33 


CcLG09 


96.75 


54 


1.79 


96.81 


58 


1.67 


CcLG1 0 


62.83 


69 


0.91 


62.83 


71 


0.88 


CcLG1 1 


1 24.25 


94 


1.32 


1 24.1 8 


96 


1.29 


Total 


967.03 


875 


1 .1 1 


996.21 


910 


1.09 
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Figure 1 . An integrated genetic map of pigeonpea. This genetic map is developed on the F 2 mapping population derived from ICP 28 x ICPW 
94. The map comprises 91 0 loci (875 PKAMs and 35 SSRs) in which 444 PKAMs were developed based on GAIIx-SNPs, shown in red; 431 
PKAMs were developed based on TOGs-SNPs, shown in green, and 35 SSRs previously mapped by Bohra etal. 9 shown in black. 
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3.6. Genome relationships of pigeonpea with closely 
related legume species 

For comparative genome analysis and to identify 
conserved synteny between genomes of pigeonpea 
and other related legume species, we combined 
both the genetic map position information for 
pigeonpea loci and genome sequence information 
of closely related species of different clades. A set 
of 875 mapped loci for which both genetic map 
positions and sequence information were available 
(Supplementary Table S3) was compared with the 
genome assemblies of soybean (Glymal), Medicago 
(Mt 3.5) and Lotus (Lj 2.5 pseudomolecules) and with 
the transcript-specific genetic map of cowpea. 23 In 
the comparison of pigeonpea with soybean, the 
highest percentage of the sequence similarity was 
identified. As expected, each pigeonpea LG shows 
extensive synteny with two or more chromosomes in 
soybean (Fig. 2a), probably due to the independent 
duplication event in the soybean genome. 39 A total 
of 687 pigeonpea unique loci matched with 
2480 soybean sequence stretches distributed on dif- 
ferent chromosomes of soybean (Glymal assembly; 
Table 3). Maximum similarity was identified between 
CcLG04 with Gm13 followed by CcLG02 with 
Gm10, CcLG03 with Gm06 and Gm04, CcLG03 
with Gm06, CcLG06 with Gm01, CcLGIO with 
Gm15, CcLG07 with Gm18, CcLG08 with Gml 5, 
CcLG11 with Gm17, CcLG01 with Gm18, CcLG09 
with Gm08 and CcLG05 with Gm08. 

In the case of cowpea in which the genetic map was 
used for the comparison, least matches were observed 
between pigeonpea and cowpea genomes. Only 57 
unique pigeonpea loci showed synteny with 62 loci 
on the cowpea map (Fig. 2b, Supplementary Table 
S4). In the case of pigeonpea and Medicago, 22 8 
unique pigeonpea loci showed significant matches 
with 349 genomic regions on the Medicago chromo- 
some (Fig. 2c, Supplementary Table S5). A total of 
20 pigeonpea loci from CaLG02 showed similarity 
to MtChrO! genomic regions. A similar number of 
loci from CcLG03 showed similarity to MtChr03 
genomic regions. CcLG04 showed almost equal simi- 
larity to MtChr04 (2 1 ) and MtChr05 (1 8). Similarly, 
loci from CcLG07 showed maximum matches to 
MtChr07 followed by CcLG08 with MtChr02, 
CcLG05 with MtChr08, CcLG06 with MtChr05 and 
CcLG1 1 with MtChr04. In the comparison of pigeon- 
pea with Lotus, 216 pigeonpea unique loci matched 
with 303 different genomic regions on the Lotus chro- 
mosomes (Fig. 2d, Supplementary Table S6). In brief, 
each LG of pigeonpea showed considerable synteny 
with one or more chromosomes of Medicago and 
Lotus. The distribution of similarity hits across eight 
pigeonpea LGs was varied from 2 (CcLG05) to 9 



(CcLG09) while comparing with cowpea, from 14 
(CcLG09) to 55 (CcLG04) with Medicago and from 
1 8 (CcLGO!) to 74 (CcL03) with Lotus. 



4. Discussion 

The current availability of more than 3000 PCR- 
based markers in pigeonpea 8,9 could not provide 
high or significant marker density in any of the popu- 
lations to be adequate to allow a thorough scan of the 
genome for QTL discovery, association analysis, map- 
based cloning and anchoring of the genome sequence 
with the genetic map. This can be attributed to the 
low level of polymorphism in Cajanus spp. as well as 
the small number of lines in the mapping populations 
used for developing the genetic maps. To overcome 
the above-mentioned problems to some extent, a 
new mapping population with 1 67 F 2 lines, compared 
with 72 lines used in the map of Bohra et al. 9 , was 
developed and used for developing the genetic map. 
Furthermore, SNP markers were targeted for develop- 
ing the cost-effective genotyping platform and devel- 
oping the genetic map. As the SNP markers were 
derived from genes, the comparison of the SNP- 
based genetic map of pigeonpea with the genome se- 
quence assemblies of soybean, Medicago and Lotus 
and the transcript map of cowpea provided the 
genome relationship of pigeonpea with the genomes 
of these legumes. 

When compared with the other marker systems, 
SNP markers are unique with regard to their amen- 
ability to high-throughput and low-cost (per data 
point) genotyping platforms. 17 In the case of pigeon- 
pea, a total of 17 113 SNPs were discovered after 
comparing the transcript sequence reads from 1 2 
parental lines of six different mapping populations 
with the transcriptome and/or genome sequence of 
pigeonpea. 6,1 3 To prove the usefulness of these pre- 
dicted polymorphisms for practical plant breeding 
applications, validation of these SNPs is required. For 
this purpose, a number of SNP genotyping platforms 
such as GoldenGate, Infinium and KASPar assays are 
available. However, in this study, due to its cost- effect- 
ive and flexible nature, the KASPar assay was devel- 
oped for 1616 SNPs. KASPar assays can be flexibly 
used to validate any number of SNP markers on a 
desired range of accessions, unlike many other 
SNP genotyping platforms such as GoldenGate 
or Infinium assay, to produce a sufficient number of 
polymorphic markers in a given population to 
obtain a better coverage. In the published literature, 
three reports are available on the development of 
KASPar assays in crop plants. For instance, in the 
case of wheat, KASPar assays were developed for 
1114 SNPs and validated on 23 wheat varieties and 
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Figure 2. Syntenic relationships of individual LGs of pigeonpea with other legumes. Each line radiating from a pigeonpea LG represents a 
similarity match found in a block between pigeonpea and other legumes, (a) Pigeonpea LGs showing synteny with the genome 
assembly of soybean, (b) pigeonpea LGs showing synteny with the cowpea transcript map, (c) pigeonpea LGs showing synteny with 
the genome assembly of Medicago and (d) pigeonpea LGs showing synteny with the genome assembly of Lotus. 



also used for integrating SNP markers into the genetic 
map of wheat. 26 In the case of common bean, KASpar 
assays have been developed for 94 SNPs and used for 
analysing genetic diversity in 70 accessions. 27 Very re- 
cently, Hiremath et al. 28 developed KASpar assays for 
2005 SNPs in chickpea and used these for genetic di- 
versity analysis and genetic mapping in chickpea and 
comparative mapping in legumes. A comparison of 
genotyping ~100 chickpea lines with ~500 SNPs 
on GoldenGate and KASPar assays shows the 



superiority of KASPar assays over GoldenGate assays 
in terms of cost as well as time used. In summary, 
all these three studies underline the importance of 
KASPar assays for SNP genotyping on a large scale 
for genetics and breeding applications. In the 
present study, though 1 827 SNPs were attempted 
for conversion into KASPar assays, only 1616 
(88.4%) markers could be successfully converted. 
The failure of the remaining SNP markers (1 1 .6%) to 
be validated is likely due to the presence of 



p 



Table 3. Detailed results on the comparison of mapped marker loci of pigeonpea with the soybean (C. max) genome 



LGs Pigeonpea Glycine max chromosomes/Scaffolds 





unique 
loci (no.) 


Gm01 


Gm02 


Gm03 


Gm04 


Gm05 Gm06 


Gm07 


Gm08 


Gm09 


Gm1 0 


Gm1 1 


Gm1 2 


Gm1 3 


Gm14 Gm15 


Gm1 6 


Gm1 7 


CcLGOl 


52 


7 


1 0 


1 5 


3 


2 1 


1 6 


1 8 


2 


8 


32 


2 


2 


4 


2 


7 


2 


CcLG02 


1 01 


4 


42 


6 


3 


— 1 0 


3 


3 


4 


83 


1 


6 


1 4 


41 


5 


2 


1 1 


CcLG03 


99 


5 


6 


48 


78 


6 78 


7 


5 


2 


7 


3 


3 


7 


7 


4 


6 


3 


CcLG04 


52 


2 


2 




2 


2 1 2 




3 


1 




53 


54 


106 




5 


1 




CcLG05 


23 


2 






7 


27 4 


1 0 


27 


2 


1 








1 


3 






CcLG06 


66 


69 


34 


3 


5 


2 5 


8 


4 


20 


6 


35 


3 


3 


4 


3 


6 


8 


CcLG07 


66 


5 


8 


6 


19 


4 1 2 


1 4 


1 6 


23 


8 


4 


2 


3 




1 


1 4 


2 


CcLG08 


54 


5 


5 


2 


1 


4 1 


1 2 


1 0 


49 


6 


3 


5 


1 8 


3 


51 


1 6 


5 


CcLG09 


41 






2 


1 


26 


5 


31 


1 0 




2 


2 


1 1 




2 


4 


1 


CcLG1 0 


56 


9 


7 


8 


24 


7 1 8 


8 


33 


5 


3 


1 1 


1 6 


40 


6 


64 


3 


5 


CcLG1 1 


77 


6 


5 


8 


5 


30 1 


26 


21 


1 0 


2 


6 


1 


20 


8 


6 


21 


47 


Total 


687 


114 


119 


98 


148 


110 142 


109 


171 


128 


124 


150 


94 


224 


74 


146 


80 


84 




LGs 


Glycine max chromosomes/Scaffolds 






























Gm18 Gm19 Gm20 Scaffold_1 1 29 


Scaffold 


_41 Scaffold_42 


Scaffold 


_31 7 


Scaffold. 


96 Scaffold_23 


Scaffold. 


_1 337 


Scaffold. 


.1655 Scaffold_90 


Total 


CcLGOl 


45 


— 5 






























1 83 


CcLG02 


2 


7 59 


2 




2 
























310 


CcLG03 


2 


44 6 






1 
























328 


CcLG04 


5 


0 1 








1 






















250 


CcLG05 


1 


2 — 






























87 


CcLG06 


4 


6 5 










2 




2 
















237 


CcLG07 


57 


20 4 
















2 














224 


CcLG08 


1 


2 5 




















1 




1 






206 


CcLG09 


3 


1 2 2 






























1 14 


CcLG1 0 


1 7 


4 7 


























1 




296 


CcLG1 1 


1 3 


5 4 






























245 


Total 


ISO 102 98 


2 




3 


1 


2 




2 


2 




1 




1 


1 




2480 


The numbers shown in bold represent the highest matches between pigeonpea and soybean. 
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paralogous sequences, incorrect primer designing 
and/or the need to optimize PCR conditions. This 
conversion rate is higher than that of the other 
KASPar studies on wheat (67%) 26 and chickpea 
(80. 6%). 28 This rate of conversion from selected 
SNPs to functional KASPar assays could probably be 
increased with optimization of primer designing and 
amplification conditions. However, we have made no 
attempt to optimize the failed assays. For ease of 
use, developed KASPar assay-based markers were 
designated as PKAMs. 

Screening of a new set of SNP markers (KASPar 
assays) on a set of 24 diverse pigeonpea genotypes 
representing the parents of 14 mapping populations 
segregating for various economically important 
traits provides readily available polymorphic markers 
for developing genetic maps and undertaking trait 
mapping in the respective mapping populations. In 
fact, these crossing combinations were selected 
based on diversity revealed through trait phenotyping 
and SSR profiling data. 40 Although KASPar assays were 
developed for SNPs identified to be polymorphic 
between ICP 28 and ICPW 94, only 71.4% (1154) 
markers showed polymorphism in the tested geno- 
types. The remaining 28.6% (462) markers did not 
show polymorphism at all, including between the 
ICP 28 and ICPW 94 genotypes, indicating incorrect 
prediction. Out of 462 monomorphic markers, 379 
(82.1%) markers were identified based on the 
lllumina GA llx transcript sequence data, and the 
remaining 83 (1 7.9%) SNPs were identified based 
on allele re-sequencing of TOGs. This emphasizes the 
need of stringent selection criteria and the validation 
of in silico identified SNPs via allele re-sequencing. 28 In 
brief, this study provides a list of polymorphic markers 
for different mapping populations that segregate for a 
number of important traits like Fusarium wilt, sterility 
mosaic disease and fertility restoration etc. that are 
important for pigeonpea improvement. The number 
of polymorphic markers identified in this study in 
intra-specific mapping populations was less (up to 
55 markers in a given cross; Supplementary Table S1); 
however, these polymorphic markers would be helpful 
in enriching the recently developed SSR-based genetic 
linkage maps of intra-specific mapping populations. 1 4 

The present study reports a comprehensive genetic 
map comprising 875 loci by using 1 67 F 2 individuals 
derived from ICP 28 x ICPW 94. Initial construction of 
a skeletal map with un-skewed markers and followed 
by the integration of distorted markers helped in min- 
imizing the possibility for spurious assignments of 
markers. 9 Eight hundred and seventy-five marker 
loci were mapped on 1 1 LGs corresponding to the 
1 1 chromosome pairs of the pigeonpea genome. 
The total length of the map was 967.03 cM and the 
average marker spacing was 1.11 cM. The current 



pigeonpea linkage map is a considerable improve- 
ment over the previous pigeonpea genetic linkage 
maps using SSR and DArT markers. 9,12 The marker 
density in the current map is almost three times 
higher than that in the previous maps. This higher 
marker density would be useful in determining 
double recombinants affecting a single marker. SNP 
genotyping using KASPar assays resulted in a much 
lower genotyping error rate than that obtained with 
markers like SSRs. In addition, SNP markers showing 
null alleles or an excess number of double recombi- 
nants were removed from the analyses. Owing to 
this careful error checking, the current map shows 
an increase in the total marker density compared 
with the previous maps developed by using SSR and 
DArT markers. 9,12 However, we have noticed that in 
two LGs (i.e. CcLG02 and CcLG06), a few marker 
spaces were larger than 20 cM. Therefore, it is 
required to develop more markers and fill the gaps 
in the large marker space to increase the density of 
the linkage map. Earlier to this map, Bohra et al. 9 
developed an SSR marker-based map comprising 
239 loci by using 72 F 2 lines derived from ICP 28 x 
ICPW 94. To develop a consensus map, the newly 
developed mapping population (ICP 28 x ICPW 94) 
was also genotyped with targeted SSRs and an inte- 
grated genetic linkage map covering 996.21 cM was 
constructed. With the help of a recently available 
pigeonpea draft genome sequence, 13 efforts are 
underway to develop a large number of SSR and 
SNP markers. Therefore, this map should serve as a 
'reference map' for other future genetic maps of 
pigeonpea. Moreover, as the SNP markers are 
derived from the transcriptome sequences, these 
markers and the map would be very useful for 
linking the future genetic maps and the genome se- 
quence of pigeonpea. In addition to the polymorphic 
markers in parental combinations of intra-specific 
mapping populations, most of the mapped PKAM 
markers in the inter-specific mapping population 
were monomorphic in cultivated parental lines. 
However, these mapped loci have provided a resource 
that can be used for conducting association analysis 
and linkage disequilibrium estimation in pigeonpea 
germplasm. 

The recent development of a large data set of crop 
genomic sequences has aided in global gene predic- 
tions as well as in the identification of sequences im- 
portant in gene regulation. In addition, comparative 
sequence analysis of crops is poised to contribute to 
the exploration of the genetic bases for differences 
and similarities among species. We are likely at last 
to understand the genetic explanation of how species 
have adapted to perform their shared or unique bio- 
logical functions. Analysis of the sequences containing 
mapped SNPs onto pigeonpea genetic maps against 
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two model legume species (Lotus and Medicago) 
and two crop legume species (soybean and cowpea) 
showed maximum similarity to soybean sequences 
(2480 soybean sequences). In general, markers 
mapped on one LG hit sequences from several chromo- 
somes of soybean, suggesting the occurrence of 
chromosomal rearrangement in both the genomes. 
As soybean has under gone recent complete genome 
duplication, 39 almost all the pigeonpea sequences 
have shown two to three different hits with soybean 
genome sequences. It also proves the taxonomical clas- 
sification of pigeonpea and soybean, which lie in close 
proximity under subfamily Papilionoideae. 4 ^ With the 
model legume species 39.8% (Medicago) and 34.6% 
(Lotus) sequence similarities were identified whereas 
in the case of cowpea, 7.1% pigeonpea sequences 
could provide similarity results. This may be attributed 
to the fact that for sequence similarity analysis with 
pigeonpea sequences, genome assemblies of soybean, 
Medicago and Lotus were used, and during the time of 
analysis, genome sequence information was not avail- 
able for cowpea; hence, the analysis was done by com- 
paring with the high-density linkage map developed by 
Muchero et al. 23 We discovered that synteny blocks in 
each of the 1 1 pigeonpea LGs were syntenic to their 
counterparts of the chromosomes of four legumes, 
implying certain colinearity for the syntenic chromo- 
some/linkage pairs. The conserved sequences identi- 
fied among five legumes (pigeonpea, soybean, 
cowpea, Medicago and Lotus) and the data from the 
comparative genome analysis should facilitate studies 
on genome evolution and analysis of the structural 
genome, but more importantly should facilitate the 
functional inference of genes in pigeonpea. The deter- 
mination of gene functions is difficult in non-model 
species including pigeonpea; thus, functional genome 
analysis will have to rely heavily on the establishment 
of orthologies from model species. 

In summary, this study provides an extensive re- 
source of SNPs, their conversion in cost-effective 
KASPar assays and their application in constructing 
a dense genetic map in pigeonpea and in similarity 
analysis across five legumes. The developed genetic 
map is the most comprehensive genetic map for 
pigeonpea based on a single mapping population. 
Through mapped SNPs, we have identified complex 
syntenic relationships between soybean and pigeon- 
pea by comparative genomics analysis. We consider 
that it will be possible for pigeonpea breeders to 
attain one of their most important goals, to rapidly 
and economically genotype thousands of accessions 
with a large and flexible number of markers. Although 
this is the first reported sizeable scale SNP mapping 
effort in pigeonpea, a larger number of informative 
SNPs mapped at minimum intervals will be necessary 
for broader applications. The extensive genomics 



resources developed by the whole-genome sequen- 
cing of pigeonpea, coupled to future re-sequencing 
multiple breeding lines, promise to considerably 
increase the number of informative SNPs, permitting 
exceptional levels of precision genetic analysis in 
pigeonpea breeding. 
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